{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Installation and Imports\n",
    "Please follow the **installation guide** in the [ThoughtSource Readme file](https://github.com/OpenBioLink/ThoughtSource) before using this notebook."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "from cot import Collection\n",
    "from cot.generate import FRAGMENTS\n",
    "from rich.pretty import pprint\n",
    "import json"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Overview\n",
    "The ThoughtSource library offers functionality for: \n",
    "1) Loading datasets\n",
    "2) Generating novel chain-of-thought reasoning data and answers\n",
    "3) Evaluating results\n",
    "4) Visualizing results on a Web Application"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Loading, sampling and saving a dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Loading worldtree...\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "| Name      |   Train |   Valid |   Test |\n",
       "|-----------|---------|---------|--------|\n",
       "| worldtree |    2207 |     496 |   1664 |\n",
       "\n",
       "Not loaded: ['aqua', 'asdiv', 'commonsense_qa', 'entailment_bank', 'gsm8k', 'mawps', 'med_qa', 'med_qa_open', 'medmc_qa', 'mmlu_anatomy', 'mmlu_clinical_knowledge', 'mmlu_college_biology', 'mmlu_college_medicine', 'mmlu_medical_genetics', 'mmlu_professional_medicine', 'open_book_qa', 'pubmed_qa', 'qed', 'strategy_qa', 'svamp']"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# load a dataset to sample from \n",
    "worldtree = Collection([\"worldtree\"], verbose=False)\n",
    "worldtree"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "| Name      |   Train | Valid   | Test   |\n",
       "|-----------|---------|---------|--------|\n",
       "| worldtree |      10 | -       | -      |\n",
       "\n",
       "Not loaded: ['aqua', 'asdiv', 'commonsense_qa', 'entailment_bank', 'gsm8k', 'mawps', 'med_qa', 'med_qa_open', 'medmc_qa', 'mmlu_anatomy', 'mmlu_clinical_knowledge', 'mmlu_college_biology', 'mmlu_college_medicine', 'mmlu_medical_genetics', 'mmlu_professional_medicine', 'open_book_qa', 'pubmed_qa', 'qed', 'strategy_qa', 'svamp']"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Randomly select 10 rows from train split\n",
    "worldtree_10 = worldtree.select(split=\"train\", number_samples=10, random_samples=True, seed=0)\n",
    "worldtree_10"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Loading med_qa...\n",
      "Loading pubmed_qa...\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "| Name      |   Train | Valid   | Test   |\n",
       "|-----------|---------|---------|--------|\n",
       "| med_qa    |     100 | -       | -      |\n",
       "| pubmed_qa |     100 | -       | -      |\n",
       "\n",
       "Not loaded: ['aqua', 'asdiv', 'commonsense_qa', 'entailment_bank', 'gsm8k', 'mawps', 'med_qa_open', 'medmc_qa', 'mmlu_anatomy', 'mmlu_clinical_knowledge', 'mmlu_college_biology', 'mmlu_college_medicine', 'mmlu_medical_genetics', 'mmlu_professional_medicine', 'open_book_qa', 'qed', 'strategy_qa', 'svamp', 'worldtree']"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Note that you can also sample from multiple datasets into one collection:\n",
    "collection_medical = Collection([\"med_qa\", \"pubmed_qa\"], verbose=False)\n",
    "collection_medical_100 = collection_medical.select(split=\"train\", number_samples=100)\n",
    "collection_medical_100"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "4f57e102",
   "metadata": {},
   "source": [
    "## 2. Generating reasoning chains and extracting answers"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "4f57e102",
   "metadata": {},
   "source": [
    "#### Using predefined text snippets"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "4f043028",
   "metadata": {},
   "source": [
    "ThoughtSource comes pre-loaded with a large [collection of text snippets ('prompt fragments')](https://github.com/OpenBioLink/ThoughtSource/blob/main/libs/cot/cot/fragments.json) to elicit chain-of-thought reasoning in large language models and to extract answers from chains-of-thought. Let's see how prompt fragments look like:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"font-weight: bold\">[</span>\n",
       "<span style=\"color: #7fbf7f; text-decoration-color: #7fbf7f\">│   </span><span style=\"font-weight: bold\">(</span><span style=\"color: #008000; text-decoration-color: #008000\">'kojima-01'</span>, <span style=\"color: #008000; text-decoration-color: #008000\">\"Answer: Let's think step by step.\"</span><span style=\"font-weight: bold\">)</span>,\n",
       "<span style=\"color: #7fbf7f; text-decoration-color: #7fbf7f\">│   </span><span style=\"font-weight: bold\">(</span><span style=\"color: #008000; text-decoration-color: #008000\">'kojima-02'</span>, <span style=\"color: #008000; text-decoration-color: #008000\">'Answer: We should think about this step by step.'</span><span style=\"font-weight: bold\">)</span>,\n",
       "<span style=\"color: #7fbf7f; text-decoration-color: #7fbf7f\">│   </span><span style=\"font-weight: bold\">(</span><span style=\"color: #008000; text-decoration-color: #008000\">'kojima-03'</span>, <span style=\"color: #008000; text-decoration-color: #008000\">'Answer: First,'</span><span style=\"font-weight: bold\">)</span>\n",
       "<span style=\"font-weight: bold\">]</span>\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\u001b[1m[\u001b[0m\n",
       "\u001b[2;32m│   \u001b[0m\u001b[1m(\u001b[0m\u001b[32m'kojima-01'\u001b[0m, \u001b[32m\"Answer: Let's think step by step.\"\u001b[0m\u001b[1m)\u001b[0m,\n",
       "\u001b[2;32m│   \u001b[0m\u001b[1m(\u001b[0m\u001b[32m'kojima-02'\u001b[0m, \u001b[32m'Answer: We should think about this step by step.'\u001b[0m\u001b[1m)\u001b[0m,\n",
       "\u001b[2;32m│   \u001b[0m\u001b[1m(\u001b[0m\u001b[32m'kojima-03'\u001b[0m, \u001b[32m'Answer: First,'\u001b[0m\u001b[1m)\u001b[0m\n",
       "\u001b[1m]\u001b[0m\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Chain of thought prompts\n",
    "pprint(list(FRAGMENTS[\"cot_triggers\"].items())[:3])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"font-weight: bold\">[</span>\n",
       "<span style=\"color: #7fbf7f; text-decoration-color: #7fbf7f\">│   </span><span style=\"font-weight: bold\">(</span><span style=\"color: #008000; text-decoration-color: #008000\">'kojima-03'</span>, <span style=\"color: #008000; text-decoration-color: #008000\">'The answer is'</span><span style=\"font-weight: bold\">)</span>,\n",
       "<span style=\"color: #7fbf7f; text-decoration-color: #7fbf7f\">│   </span><span style=\"font-weight: bold\">(</span><span style=\"color: #008000; text-decoration-color: #008000\">'kojima-numerals'</span>, <span style=\"color: #008000; text-decoration-color: #008000\">'Therefore, the answer (arabic numerals) is'</span><span style=\"font-weight: bold\">)</span>,\n",
       "<span style=\"color: #7fbf7f; text-decoration-color: #7fbf7f\">│   </span><span style=\"font-weight: bold\">(</span><span style=\"color: #008000; text-decoration-color: #008000\">'kojima-yes-no'</span>, <span style=\"color: #008000; text-decoration-color: #008000\">'Therefore, the answer (Yes or No) is'</span><span style=\"font-weight: bold\">)</span>,\n",
       "<span style=\"color: #7fbf7f; text-decoration-color: #7fbf7f\">│   </span><span style=\"font-weight: bold\">(</span><span style=\"color: #008000; text-decoration-color: #008000\">'kojima-A-C'</span>, <span style=\"color: #008000; text-decoration-color: #008000\">'Therefore, among A through C, the answer is'</span><span style=\"font-weight: bold\">)</span>,\n",
       "<span style=\"color: #7fbf7f; text-decoration-color: #7fbf7f\">│   </span><span style=\"font-weight: bold\">(</span><span style=\"color: #008000; text-decoration-color: #008000\">'kojima-A-D'</span>, <span style=\"color: #008000; text-decoration-color: #008000\">'Therefore, among A through D, the answer is'</span><span style=\"font-weight: bold\">)</span>\n",
       "<span style=\"font-weight: bold\">]</span>\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\u001b[1m[\u001b[0m\n",
       "\u001b[2;32m│   \u001b[0m\u001b[1m(\u001b[0m\u001b[32m'kojima-03'\u001b[0m, \u001b[32m'The answer is'\u001b[0m\u001b[1m)\u001b[0m,\n",
       "\u001b[2;32m│   \u001b[0m\u001b[1m(\u001b[0m\u001b[32m'kojima-numerals'\u001b[0m, \u001b[32m'Therefore, the answer \u001b[0m\u001b[32m(\u001b[0m\u001b[32marabic numerals\u001b[0m\u001b[32m)\u001b[0m\u001b[32m is'\u001b[0m\u001b[1m)\u001b[0m,\n",
       "\u001b[2;32m│   \u001b[0m\u001b[1m(\u001b[0m\u001b[32m'kojima-yes-no'\u001b[0m, \u001b[32m'Therefore, the answer \u001b[0m\u001b[32m(\u001b[0m\u001b[32mYes or No\u001b[0m\u001b[32m)\u001b[0m\u001b[32m is'\u001b[0m\u001b[1m)\u001b[0m,\n",
       "\u001b[2;32m│   \u001b[0m\u001b[1m(\u001b[0m\u001b[32m'kojima-A-C'\u001b[0m, \u001b[32m'Therefore, among A through C, the answer is'\u001b[0m\u001b[1m)\u001b[0m,\n",
       "\u001b[2;32m│   \u001b[0m\u001b[1m(\u001b[0m\u001b[32m'kojima-A-D'\u001b[0m, \u001b[32m'Therefore, among A through D, the answer is'\u001b[0m\u001b[1m)\u001b[0m\n",
       "\u001b[1m]\u001b[0m\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Answer extraction prompts\n",
    "pprint(list(FRAGMENTS[\"answer_extractions\"].items())[2:7])"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "4f57e102",
   "metadata": {},
   "source": [
    "#### Configuration parameters"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[94m \n",
      "    \"instruction_keys\": list(str) - Determines which instruction_keys are used from fragments.json,\n",
      "        the corresponding string will be inserted under \"instruction\" in the fragments. Default: [None] (No instruction)\n",
      "    \"cot_trigger_keys\": list(str) - Determines which cot triggers are used from fragments.json,\n",
      "        the corresponding string will be inserted under \"cot_trigger\" in the fragments. Default: [\"kojima-01\"]\n",
      "    \"answer_extraction_keys\": list(str) - Determines which answer extraction prompts are used from fragments.json,\n",
      "        the corresponding string will be inserted under \"answer\" in the fragments. Default: [\"kojima-01\"]\n",
      "    \"template_cot_generation\": string - is the model input in the text generation step, variables in brackets.\n",
      "        Only variables of this list are allowed: \"instruction\", 'question\", \"answer_choices\", \"cot_trigger\"\n",
      "        Default: {instruction}\\n\\n{question}\\n{answer_choices}\\n\\n{cot_trigger}\n",
      "    \"template_answer_extraction\": string - is the model input in the answer extraction step, variables in brackets.\n",
      "        Only variables of this list are allowed: \"instruction\", 'question\", \"answer_choices\", \"cot_trigger\",\n",
      "        \"cot\", \"answer\"\n",
      "        Default: {instruction}\\n\\n{question}\\n{answer_choices}\\n\\n{cot_trigger}{cot}\\n{answer_extraction}\n",
      "    \"author\" : str - Name of the person responsible for generation, Default: \"\"\n",
      "    \"api_service\" str - Name of the used api service: \"openai\", \"openai_chat\", \"huggingface_hub\", \"huggingface_endpoint\" or \"cohere\".\n",
      "        Plus a mock api service \"mock_api\" for debugging, Default: \"huggingface_hub\"\n",
      "    \"engine\": str -  Name of model used, look at website of api which are\n",
      "        available, e.g. for \"openai\": \"text-davinci-002\", Default: \"google/flan-t5-xl\"\n",
      "    \"temperature\": float - Describes how much randomness is in the generated output,\n",
      "        0.0 means the model will only output the most likely answer, 1.0 means\n",
      "        the model will also output very unlikely answers, defaults to 0\n",
      "    \"max_tokens\": int - Maximum length of output generated by model , Default: 256\n",
      "    \"api_time_interval\": float - Pause between two api calls in seconds, Default: 1.0\n",
      "    \"warn\": bool - Print warnings preventing excessive api usage, Default: True\n",
      "    \n"
     ]
    }
   ],
   "source": [
    "# overview of all available configurations\n",
    "from cot.config import Config as config_overview\n",
    "print(f'\\033[94m {config_overview.__doc__[48:]}')"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Defining your own template (optional)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The default chain of thought generation template as shown above is: \"{instruction}\\n\\n{question}\\n{answer_choices}\\n\\n{cot_trigger}\". </br>\n",
    "You could also define your own template with a different structure and even free text:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [],
   "source": [
    "# print(\"Answer this question:\\n{question}\\n{answer_choices}\\n\\nGive a detailed explanation of your answer.\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you define your custom chain of thought generation template, do not forget to also define a custom answer extraction template."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [],
   "source": [
    "# print(\"Answer this question:\\n{question}\\n{answer_choices}\\n\\nGive a detailed explanation of your answer.{cot}\\n{answer_extraction}\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "4f57e102",
   "metadata": {},
   "source": [
    "### 2.1 Using a Mock-API to create reasoning chains and extract answers"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Configuration of the input and parameters of the language model \n",
    "config={\n",
    "    # We compare three different prompts for the chain of thought generation:\n",
    "    # \"Answer: Let's think step by step.\" and 'Answer: We should think about this step by step.', and \"Answer: First,\" \n",
    "    \"cot_trigger_keys\": ['kojima-01','kojima-02', 'kojima-03'],\n",
    "\n",
    "    # We use the same answer extraction prompt for all three prompts\n",
    "    # \"Therefore, among A through D, the answer is\"\n",
    "    \"answer_extraction_keys\": ['kojima-A-D'], \n",
    "    \n",
    "    \"author\" : \"your_name\",\n",
    "    \"api_service\": \"mock_api\", # <--- We use a mock API here for demonstration purposes of the tutorial.\n",
    "    \"engine\": \"\", \n",
    "    \"temperature\": 0,\n",
    "    \"max_tokens\": 512,\n",
    "    \"verbose\": False,\n",
    "    \"warn\": True,\n",
    "}"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Generating reasoning chains and extracting answers in one call"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Generating worldtree...\n"
     ]
    }
   ],
   "source": [
    "# Generating chains-of-thought and answer extractions\n",
    "worldtree_10.generate(config=config)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As above was a fake call to the mock API we now **load a prepared dataset** with real model answers for the purpose of the tutorial:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [],
   "source": [
    "# loading a pre-generated example for the purpose of this tutorial\n",
    "worldtree_10 = Collection.from_json(\"worldtree_10.json\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "| Name      |   Train | Valid   | Test   |\n",
       "|-----------|---------|---------|--------|\n",
       "| worldtree |      10 | -       | -      |\n",
       "\n",
       "Not loaded: ['aqua', 'asdiv', 'commonsense_qa', 'entailment_bank', 'gsm8k', 'mawps', 'med_qa', 'med_qa_open', 'medmc_qa', 'mmlu_anatomy', 'mmlu_clinical_knowledge', 'mmlu_college_biology', 'mmlu_college_medicine', 'mmlu_medical_genetics', 'mmlu_professional_medicine', 'open_book_qa', 'pubmed_qa', 'qed', 'strategy_qa', 'svamp']"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "worldtree_10"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "4f57e102",
   "metadata": {},
   "source": [
    "### 2.2 Using your own API to create reasoning chains and extract answers"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "ThoughtSource can connect to external AI service providers such as the [OpenAI API](https://openai.com/api/) or the [Hugging Face Hub](https://huggingface.co/docs/hub/index). Set your token, 'api_service' and 'engine' parameters accordingly.\n",
    "\n",
    "In this tutorial we will use the Hugging Face Hub, which is for free. You can make an account and then copy your token from the Hugging Face settings page.\n",
    "\n",
    "To use the API you need to set the environment variable `HUGGINGFACEHUB_API_TOKEN` to your API token:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [],
   "source": [
    "# os.environ[\"HUGGINGFACEHUB_API_TOKEN\"] = \"<token>\"   # <--- set token (can be found in your Hugging Face settings page)\n",
    "# os.environ[\"OPENAI_API_KEY\"] = \"<token>\"  # <--- Set token for which API you want to use"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Configuration of the input and parameters of the language model \n",
    "config={\n",
    "    # We compare three different prompts for the chain of thought generation:\n",
    "    # \"Answer: Let's think step by step.\" and 'Answer: We should think about this step by step.', and \"Answer: First,\" \n",
    "    \"cot_trigger_keys\": ['kojima-01','kojima-02', 'kojima-03'],\n",
    "\n",
    "    # We use the same answer extraction prompt for all three prompts\n",
    "    # \"Therefore, among A through D, the answer is\"\n",
    "    \"answer_extraction_keys\": ['kojima-A-D'], \n",
    "    \n",
    "    \"api_service\": \"huggingface_hub\", # <--- Select which API you want to use\n",
    "    \"engine\": \"google/flan-t5-xl\", # <--- Select which model you want to use\n",
    "    \"temperature\": 0,\n",
    "    \"max_tokens\": 512,\n",
    "    \"verbose\": False,\n",
    "    \"warn\": True,\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Loading worldtree...\n"
     ]
    }
   ],
   "source": [
    "# Loading just one sample from the dataset so it runs faster\n",
    "worldtree = Collection([\"worldtree\"], verbose=False)\n",
    "worldtree_1 = worldtree.select(split=\"train\", number_samples=1) # just selecting 1 sample, change that if you want to run on more"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is a **two step process:**\n",
    " - The language model first answers a question with a detailed reasoning chain.\n",
    " - The language model then extracts the answer from its own reasoning chain.\n",
    "\n",
    "The code does it automatically at once, but it helpful to have in mind the underlying two step process. </br>\n",
    "Therefore you need two API calls for each example as shown in the warning."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Calling the external API will approximately take 30 seconds for this example\n",
    "worldtree_1.generate(config=config)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If your cannot press 'y' because your coding environment is not interactive, set \"warn\" to false in the config to deactivate the warning."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "4f57e102",
   "metadata": {},
   "source": [
    "## 3. Evaluation of model answers and downloading all results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'id': '0d016d52-5190-40dc-aa03-d4c1bd5c1d8b',\n",
       " 'fragments_version': '0.01',\n",
       " 'instruction': None,\n",
       " 'cot_trigger': 'kojima-02',\n",
       " 'cot_trigger_template': '{instruction}\\n\\n{question}\\n{answer_choices}\\n\\n{cot_trigger}',\n",
       " 'prompt_text': '',\n",
       " 'cot': ' The color of the skin, size of rotten spots, and length of worms inside are all characteristics of the tomato that Steven picked. None of these characteristics can be inherited and changed over several generations. The only characteristic that can be inherited and changed over several generations is the number of broken branches. Therefore, the correct answer is D) number of broken branches.',\n",
       " 'answers': [{'id': '6265eb6f-f191-419a-8814-b67c8005418a',\n",
       "   'answer_extraction': 'kojima-A-D',\n",
       "   'answer_extraction_template': '{instruction}\\n\\n{question}\\n{answer_choices}\\n\\n{cot_trigger}{cot}\\n{answer_extraction}',\n",
       "   'answer_extraction_text': '',\n",
       "   'answer': ' D) number of broken branches.',\n",
       "   'answer_from_choices': 'A',\n",
       "   'correct_answer': False}],\n",
       " 'author': 'your_name',\n",
       " 'date': '2023/01/27 18:22:15',\n",
       " 'api_service': 'openai',\n",
       " 'model': \"{'name': 'text-davinci-003', 'temperature': 0, 'max_tokens': 512}\",\n",
       " 'comment': '',\n",
       " 'annotations': []}"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "worldtree_10['worldtree']['train'][5]['generated_cot'][1]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "f4cfc94377bb462ca9118211d95fb0df",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "  0%|          | 0/10 [00:00<?, ?ex/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "{'worldtree': {'train': {'accuracy': {'text-davinci-003': {'None_kojima-01_kojima-A-D': 0.7,\n",
       "     'None_kojima-02_kojima-A-D': 0.8,\n",
       "     'None_kojima-03_kojima-A-D': 0.8}}}}}"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "worldtree_10.evaluate(overwrite=True)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Save the file, which now includes the dataset and your generated reasoning chains, extracted answers and evaluation results."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Save the file that includes all generated chains of thought and answer extractions a the evaluation results\n",
    "worldtree_10.dump(\"worldtree_10.json\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you used your own API, use this code to evaluate and save the results:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [],
   "source": [
    "# print(worldtree_1.evaluate())\n",
    "# worldtree_1.dump(\"worldtree_1.json\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "4f57e102",
   "metadata": {},
   "source": [
    "## 4. Inspect the model outputs in the Web Tool"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Here is the link: **[ThoughtSource Annotator](http://thought.samwald.info:3000/)**"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Just **upload your just downloaded 'worldtree_10.json' file** to the web tool."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "base",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.6"
  },
  "orig_nbformat": 4,
  "vscode": {
   "interpreter": {
    "hash": "40d3a090f54c6569ab1632332b64b2c03c39dcf918b08424e98f38b5ae0af88f"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
